{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2021 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qFdPvlXBOdUN"
      },
      "source": [
        "# TensorFlow Addons Optimizers: CyclicalLearningRate"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MfBg1C5NB3X0"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/addons/tutorials/optimizers_cyclicallearningrate\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/addons/blob/master/docs/tutorials/optimizers_cyclicallearningrate.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/addons/blob/master/docs/tutorials/optimizers_cyclicallearningrate.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "      <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/addons/docs/tutorials/optimizers_cyclicallearningrate.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xHxb-dlhMIzW"
      },
      "source": [
        "## Overview\n",
        "\n",
        "This tutorial demonstrates the use of Cyclical Learning Rate from the Addons package."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IqEImEhBJWFv"
      },
      "source": [
        "## Cyclical Learning Rates\n",
        "\n",
        "It has been shown it is beneficial to adjust the learning rate as training progresses for a neural network. It has manifold benefits ranging from saddle point recovery to preventing numerical instabilities that may arise during backpropagation. But how does one know how much to adjust with respect to a particular training timestamp? In 2015, Leslie Smith noticed that you would want to increase the learning rate to traverse faster across the loss landscape but you would also want to reduce the learning rate when approaching convergence. To realize this idea, he proposed [Cyclical Learning Rates](https://arxiv.org/abs/1506.01186) (CLR) where you would adjust the learning rate with respect to the cycles of a function. For a visual demonstration, you can check out [this blog](https://www.jeremyjordan.me/nn-learning-rate/). CLR is now available as a TensorFlow API. For more details, check out the original paper [here](https://arxiv.org/abs/1506.01186). "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MUXex9ctTuDB"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "t-p545dluzjI"
      },
      "outputs": [],
      "source": [
        "!pip install -q -U tensorflow_addons"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "RPF3aDZZu8le"
      },
      "outputs": [],
      "source": [
        "from tensorflow.keras import layers\n",
        "import tensorflow_addons as tfa\n",
        "import tensorflow as tf\n",
        "\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "tf.random.set_seed(42)\n",
        "np.random.seed(42)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XLOnLLrlR-ti"
      },
      "source": [
        "## Load and prepare dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uAHLo_Ffvie3"
      },
      "outputs": [],
      "source": [
        "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n",
        "\n",
        "x_train = np.expand_dims(x_train, -1)\n",
        "x_test = np.expand_dims(x_test, -1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AfUS_s-uSBvx"
      },
      "source": [
        "## Define hyperparameters"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qumJ7KpwvvwE"
      },
      "outputs": [],
      "source": [
        "BATCH_SIZE = 64\n",
        "EPOCHS = 10\n",
        "INIT_LR = 1e-4\n",
        "MAX_LR = 1e-2"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G-x3E7RWSXWc"
      },
      "source": [
        "## Define model building and model training utilities"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vni6Gz3Dv9Db"
      },
      "outputs": [],
      "source": [
        "def get_training_model():\n",
        "    model = tf.keras.Sequential(\n",
        "        [\n",
        "            layers.Input((28, 28, 1)),\n",
        "            layers.experimental.preprocessing.Rescaling(scale=1./255),\n",
        "            layers.Conv2D(16, (5, 5), activation=\"relu\"),\n",
        "            layers.MaxPooling2D(pool_size=(2, 2)),\n",
        "            layers.Conv2D(32, (5, 5), activation=\"relu\"),\n",
        "            layers.MaxPooling2D(pool_size=(2, 2)),\n",
        "            layers.SpatialDropout2D(0.2),\n",
        "            layers.GlobalAvgPool2D(),\n",
        "            layers.Dense(128, activation=\"relu\"),\n",
        "            layers.Dense(10, activation=\"softmax\"),\n",
        "        ]\n",
        "    )\n",
        "    return model\n",
        "\n",
        "def train_model(model, optimizer):\n",
        "    model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
        "                       metrics=[\"accuracy\"])\n",
        "    history = model.fit(x_train,\n",
        "        y_train,\n",
        "        batch_size=BATCH_SIZE,\n",
        "        validation_data=(x_test, y_test),\n",
        "        epochs=EPOCHS)\n",
        "    return history"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RlRKWRWrSk_t"
      },
      "source": [
        "In the interest of reproducibility, the initial model weights are serialized which you will be using to conduct our experiments. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-JxnpsIzwCgj"
      },
      "outputs": [],
      "source": [
        "initial_model = get_training_model()\n",
        "initial_model.save(\"initial_model\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oNF33-tBSuFG"
      },
      "source": [
        "## Train a model without CLR"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Q4dEJtQzwjei"
      },
      "outputs": [],
      "source": [
        "standard_model = tf.keras.models.load_model(\"initial_model\")\n",
        "no_clr_history = train_model(standard_model, optimizer=\"sgd\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eaK0PAN-Sy6l"
      },
      "source": [
        "## Define CLR schedule\n",
        "\n",
        "The `tfa.optimizers.CyclicalLearningRate` module return a direct schedule that can be passed to an optimizer. The schedule takes a step as its input and outputs a value calculated using CLR formula as laid out in the paper. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ne0b8aGNyc3v"
      },
      "outputs": [],
      "source": [
        "steps_per_epoch = len(x_train) // BATCH_SIZE\n",
        "clr = tfa.optimizers.CyclicalLearningRate(initial_learning_rate=INIT_LR,\n",
        "    maximal_learning_rate=MAX_LR,\n",
        "    scale_fn=lambda x: 1/(2.**(x-1)),\n",
        "    step_size=2 * steps_per_epoch\n",
        ")\n",
        "optimizer = tf.keras.optimizers.SGD(clr)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "icVL3hsUTwXV"
      },
      "source": [
        "Here, you specify the lower and upper bounds of the learning rate and the schedule will *oscillate* in between that range ([1e-4, 1e-2] in this case). `scale_fn` is used to define the function that would scale up and scale down the learning rate within a given cycle. `step_size` defines the duration of a single cycle. A `step_size` of 2 means you need a total of 4 iterations to complete one cycle. The recommended value for `step_size` is as follows:\n",
        "\n",
        "`factor * steps_per_epoch` where factor lies within the [2, 8] range. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5JV_ESYqUb4d"
      },
      "source": [
        "In the same [CLR paper](https://arxiv.org/abs/1506.01186), Leslie also presented a simple and elegant method to choose the bounds for learning rate. You are encouraged to check it out as well. [This blog post](https://www.pyimagesearch.com/2019/08/05/keras-learning-rate-finder/) provides a nice introduction to the method. \n",
        "\n",
        "Below, you visualize how the `clr` schedule looks like. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "b_WRGfDx4Wwc"
      },
      "outputs": [],
      "source": [
        "step = np.arange(0, EPOCHS * steps_per_epoch)\n",
        "lr = clr(step)\n",
        "plt.plot(step, lr)\n",
        "plt.xlabel(\"Steps\")\n",
        "plt.ylabel(\"Learning Rate\")\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bBlKaAqNjHP1"
      },
      "source": [
        "In order to better visualize the effect of CLR, you can plot the schedule with an increased number of steps. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Gjhoyk-Li368"
      },
      "outputs": [],
      "source": [
        "step = np.arange(0, 100 * steps_per_epoch)\n",
        "lr = clr(step)\n",
        "plt.plot(step, lr)\n",
        "plt.xlabel(\"Steps\")\n",
        "plt.ylabel(\"Learning Rate\")\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ObYcy5NRkF4V"
      },
      "source": [
        "The function you are using in this tutorial is referred to as the `triangular2` method in the CLR paper. There are other two functions there were explored namely `triangular` and `exp` (short for exponential). "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-OV_8QVIe5m_"
      },
      "source": [
        "## Train a model with CLR"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zRSglElvy_fF"
      },
      "outputs": [],
      "source": [
        "clr_model = tf.keras.models.load_model(\"initial_model\")\n",
        "clr_history = train_model(clr_model, optimizer=optimizer)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8rhTLQJdnGfP"
      },
      "source": [
        "As expected the loss starts higher than the usual and then it stabilizes as the cycles progress. You can confirm this visually with the plots below. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LyHEgnv6e8lX"
      },
      "source": [
        "## Visualize losses"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wg0JjLwH2RTl"
      },
      "outputs": [],
      "source": [
        "(fig, ax) = plt.subplots(2, 1, figsize=(10, 8))\n",
        "\n",
        "ax[0].plot(no_clr_history.history[\"loss\"], label=\"train_loss\")\n",
        "ax[0].plot(no_clr_history.history[\"val_loss\"], label=\"val_loss\")\n",
        "ax[0].set_title(\"No CLR\")\n",
        "ax[0].set_xlabel(\"Epochs\")\n",
        "ax[0].set_ylabel(\"Loss\")\n",
        "ax[0].set_ylim([0, 2.5])\n",
        "ax[0].legend()\n",
        "\n",
        "ax[1].plot(clr_history.history[\"loss\"], label=\"train_loss\")\n",
        "ax[1].plot(clr_history.history[\"val_loss\"], label=\"val_loss\")\n",
        "ax[1].set_title(\"CLR\")\n",
        "ax[1].set_xlabel(\"Epochs\")\n",
        "ax[1].set_ylabel(\"Loss\")\n",
        "ax[1].set_ylim([0, 2.5])\n",
        "ax[1].legend()\n",
        "\n",
        "fig.tight_layout(pad=3.0)\n",
        "fig.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2EwZuz_pkqLM"
      },
      "source": [
        "Even though for this toy example, you did not see the effects of CLR much but be noted that it is one of the main ingredients behind [Super Convergence](https://arxiv.org/abs/1708.07120) and can have a [really good impact](https://www.fast.ai/2018/08/10/fastai-diu-imagenet/) when training in large-scale settings. "
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "optimizers_cyclicallearningrate.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
